I/O-optimal Algorithms for Orthogonal Problems for Private-Cache Chip Multiprocessors

نویسندگان

  • Deepak Ajwani
  • Nodari Sitchinava
  • Norbert Zeh
چکیده

The parallel external memory (PEM) model has been used as a basis for the design and analysis of a wide range of algorithms for the private-cache multi-core architectures. Recently a parallel version of the distribution sweeping framework was introduced to efficiently solve a number of orthogonal geometric problems in the PEM model. In this paper we improve the framework to the optimal O(sortP (N)+K/PB) I/Os, where P is the number of cores/processors, B is the number of elements that fit into a cache-line, N and K are the sizes of the input and output, respectively, and sortP (N) denotes the I/O complexity of sorting N items on a P -processor PEM model. We achieve this with a new one-dimensional batched range counting algorithm on a sorted list of ranges and points that achieves O((N + K)/PB) I/O complexity, where K is the sum of counts of all the ranges. The key to achieving efficient load balancing among the processors for this problem is a new method to count the output without enumerating it, which might be of independent interest. Keywords-parallel external memory, PEM, multicore algorithms, computational geometry, parallel distribution sweeping

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Geometric Algorithms for Private-Cache Chip Multiprocessors

We study techniques for obtaining efficient algorithms for geometric problems on private-cache chip multiprocessors. We show how to obtain optimal algorithms for interval stabbing counting, 1-D range counting, weighted 2-D dominance counting, and for computing 3-D maxima, 2-D lower envelopes, and 2-D convex hulls. These results are obtained by analyzing adaptations of either the PEM merge sort ...

متن کامل

A Reusability-Aware Cache Memory Sharing Technique for High Performance CMPs with Private L2 Caches

For high-performance chip multiprocessors (CMPs) to achieve their maximum performance potential, an efficient support for memory hierarchy is important. Since off-chip accesses require a long latency, high-performance CMPs are typically based on multiple levels of on-chip cache memories. For example, most current CMPs support two levels of on-chip caches. While the L1 cache architecture of thes...

متن کامل

Utilization of Cache Area in On-Chip Multiprocessor

On-chip multiprocessor can be an alternative to the wide-issue superscalar processor approach which is currently the mainstream to exploit the increasing number of transistors on a silicon chip. Utilization of the cache, especially for the remote data is important in the system using such on-chip multiprocessors since the ratio of the oo-chip and the on-chip memory access latencies is higher th...

متن کامل

Optimal Placement of Cores, Caches and Memory Controllers in On-Chip Network

Parallel programming is emerging fast and intensive applications need more resources, so there is a huge demand for on-chip multiprocessors. Accessing L1 caches beside the cores are the fastest after registers but the size of private caches cannot increase because of design, cost and technology limits. Then split I-cache and D-cache are used with shared LLC (last level cache). For a unified sha...

متن کامل

Optimal Placement of Cores, Caches and Memory Controllers in Network On-Chip

Parallel programming is emerging fast and intensive applications need more resources, so there is a huge demand for on-chip multiprocessors. Accessing L1 caches beside the cores are the fastest after registers but the size of private caches cannot increase because of design, cost and technology limits. Then split I-cache and D-cache are used with shared LLC (last level cache). For a unified sha...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010